-
Notifications
You must be signed in to change notification settings - Fork 156
add io offload prototype #1774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add io offload prototype #1774
Conversation
vithikashah001
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs documentation for variables, subroutines and functions.
uramirez8707
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes, but I think this is a good start
offloading/offloading_io.F90
Outdated
| integer, parameter :: non_domain_decomposed = 1 !< enumeration for type of domain | ||
| integer, parameter :: Unstructured_grid = 2 !< enumeration for type of domain | ||
|
|
||
| integer, parameter :: max_files = 10 !< amount of offloaded files to allocate space for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO Make this a namelist, runtime constant, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added with 4a5ed59
| !! Registers an axis to a netcdf file on offloaded PEs. File must have been opened with open_file_offload. | ||
| interface register_axis_offload | ||
| procedure :: register_netcdf_axis_offload | ||
| procedure :: register_domain_axis_offload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO we need a register_unstructured_axis_offload for the unstructured grid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added any leftover TODOs as comments with 969513e
offloading/offloading_io.F90
Outdated
|
|
||
| !< The number of tiles must be the same as the number of offloading pes | ||
| !if (size(pe_out) .ne. ntile ) & | ||
| ! call mpp_error(FATAL, "The number of offloading PEs must be the same as the number of tiles of the domain") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added back, and it should crash if the ntiles is not a multiple of the ntiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added with 969513e
| call mpp_broadcast(filename_out, str_len, pe_in(1)) | ||
| call mpp_broadcast(global_domain_size, size(global_domain_size), pe_in(1)) | ||
| call mpp_broadcast(ntile, pe_in(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO maybe we can find a way to do this all in one mpp_brodcast call (or mpp_send/mpp_receive)
offloading/offloading_io.F90
Outdated
| integer, allocatable :: all_current_pes(:) | ||
| integer, allocatable :: broadcasting_pes(:) | ||
| logical :: is_model_pe | ||
| character(len=255) :: var_info(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to use the max str length variable
| if (.not. is_model_pe) then | ||
| select type(wut=>this%fileobj) | ||
| type is(FmsNetcdfDomainFile_t) | ||
| call register_axis(wut, trim(var_info(1)), trim(var_info(2))) | ||
| end select | ||
| endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clean this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offloading/offloading_io.F90
Outdated
| type(domain2D) :: domain_out | ||
| type(domain2D) :: domain_in | ||
| integer :: isc, iec, jsc, jec, nz, redistribute_clock | ||
| character(len=255) :: varname_tmp(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use max str length
| !> Structure to hold offloading file information | ||
| type :: offloading_obj_out | ||
| integer :: id | ||
| character(len=:), allocatable :: filename !< filename of the offloaded netcdf file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to store the filename here, as opposed to getting it from fileobj%path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No looks like it can be removed, I'll add that change.
offloading/offloading_io.F90
Outdated
| character(len=:), allocatable :: filename !< filename of the offloaded netcdf file | ||
| class(FmsNetcdfFile_t), allocatable :: fileobj !< fms2_io file object | ||
| type(domain2D) :: domain_out !< domain on offloading PEs | ||
| integer :: type_of_domain !< type of domain (domain_decomposed, non_domain_decomposed, Unstructured_grid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem that type_of_domain is used. Is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to remove this as well.
offloading/offloading_io.F90
Outdated
| integer :: type_of_domain !< type of domain (domain_decomposed, non_domain_decomposed, Unstructured_grid) | ||
| end type | ||
|
|
||
| !> Offload equivalent of open_file in fms2_io_mod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems like it belongs above open_file_offload.
offloading/offloading_io.F90
Outdated
| type_of_domain = domain_decomposed | ||
| type is (FmsNetcdfUnstructuredDomainFile_t) | ||
| type_of_domain = Unstructured_grid | ||
| end select |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type_of_domain is assigned, but never used.
offloading/offloading_io.F90
Outdated
| endif | ||
|
|
||
| if (.not. is_model_pe) then | ||
| select type(wut=>this%fileobj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since FmsNetcdfDomainFile_t has no types derived from it, perhaps change fileobj from class(FmsNetcdfDomainFile_t) to type(FmsNetcdfDomainFile_t), and remove the select type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uramirez8707 had a similar comment too I believe.
Since the main derived type (offloading_obj_out) uses a class(FmsNetcdfFile_t), i was thinking it might be cleaner to just make all the routines take class(FmsNetcdfFile_t) instead and just error out for now in case its not a supported file type.
I figured that makes it a bit easier to use and will give us some flexibility in the future, but lmk what you guys think.
| endif | ||
|
|
||
| if (.not. is_model_pe) then | ||
| select type(wut=>this%fileobj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment, regarding class(FmsNetcdfDomainFile_t) to type(FmsNetcdfDomainFile_t).
offloading/offloading_io.F90
Outdated
| select type(wut=>this%fileobj) | ||
| type is(FmsNetcdfDomainFile_t) | ||
| call register_field(wut, trim(var_info(1)), trim(var_info(2)), var_info(3:)) | ||
| end select |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to handle the case where fileobj is FmsNetcdfFile_t as opposed to FmsNetcdfDomainFile_t - is this case supported?
| !! This routine should be called from both the model PEs and the offload PEs, with the full list | ||
| !! of pes for each group being provided. The model PEs will broadcast the filename and domain. | ||
| subroutine open_file_offload(fileobj, filename, domain_in, pe_in, pe_out) | ||
| class(FmsNetcdfFile_t), intent(inout) :: fileobj !< fms2_io file object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other subroutines (e.g., write_data_offload) only seem to support FmsNetcdfDomainFile_t and not the FmsNetcdfFile_t base type. If non-domain-decomposed files are unsupported, type(FmsNetcdfDomainFile_t) should probably be used consistently for fileobj arguments.
80b7ef9 to
969513e
Compare
|
@uramirez8707 @J-Lentz @vithikashah001 This should be ready for another pass at reviews, I think every comment should be covered. See #1774 (comment) for how i handled the typing, i just made everything take the base |
uramirez8707
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for now, but we have a lot of clean up to do.
Let’s plan the next steps next week.
Description
draft for the io offloading changes
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Please also note
any relevant details for your test configuration (e.g. compiler, OS). Include
enough information so someone can reproduce your tests.
Checklist:
make distcheckpasses